Parquet

You can load the ETL into a Parquet file, and save the new file to a shared folder location. When the ETL is run, the data will be loaded into Parquet files in the given folder. Each table in the data flow is loaded into a separate Parquet file and the corresponding .crc file is generated for each of these. To use a Parquet file as a target, add the Parquet node from the Targets panel to the data flow.

Configure a Parquet Target

From the target's Properties panel, name the new database that will be created, and provide a pointer to a shared folder where the file will be located:

  • Database Name: name the new database that will be generated when the ETL is run.
  • Shared Folder Path: provide a pointer to a shared folder where the new database will be saved.
  • Create Folders: generate folders and save the database file within these folder:
    • Database Name: create a folder named according to the given database name, and save the database file inside this folder.
    • Date Time:  create a folder named according to the date and time at which the ETL is run, and save the database file inside this folder. If a database folder is also created, the Date Time folder will be a subfolder.

Finally, click ‘Connect All’ to connect the target node to the data flow. As usual, you can add a description to the node's Properties panel.

Description

Expand the Description window to add a description or notes to the node. The description is visible only from the Properties panel of the node, and does not produce any outputs. This is a useful way to document the ETL pipeline for yourself and other users.

Run the ETL

As there is no database or in-memory destination, the Data Model and Security stages are not relevant. Skip these steps and simply run the ETL from the Data Flow.

  • Click here to learn how to process the ETL.